Search CORE

7 research outputs found

Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis

Author: Huang Sharon X.
Liu Yihao
Ni Haomiao
Xue Yuan
Publication venue
Publication date: 01/10/2022
Field of study

In this paper, we propose a novel dual-branch Transformation-Synthesis network (TS-Net), for video motion retargeting. Given one subject video and one driving video, TS-Net can produce a new plausible video with the subject appearance of the subject video and motion pattern of the driving video. TS-Net consists of a warp-based transformation branch and a warp-free synthesis branch. The novel design of dual branches combines the strengths of deformation-grid-based transformation and warp-free generation for better identity preservation and robustness to occlusion in the synthesized videos. A mask-aware similarity module is further introduced to the transformation branch to reduce computational overhead. Experimental results on face and dance datasets show that TS-Net achieves better performance in video motion retargeting than several state-of-the-art models as well as its single-branch variants. Our code is available at https://github.com/nihaomiao/WACV23_TSNet.Comment: WACV 202

arXiv.org e-Print Archive

Synthetic Augmentation with Large-scale Unconditional Pre-training

Author: Huang Sharon X.
Jin Peng
Ni Haomiao
Xue Yuan
Ye Jiarong
Publication venue
Publication date: 07/08/2023
Field of study

Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.Comment: MICCAI 202

arXiv.org e-Print Archive

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Author: Huang Sharon X.
Li Kai
Min Martin Renqiang
Ni Haomiao
Shi Changhao
Publication venue
Publication date: 23/03/2023
Field of study

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate stages: (1) an unsupervised learning stage to train a latent flow auto-encoder for spatial content generation, including a flow predictor to estimate latent flow between pairs of video frames, and (2) a conditional learning stage to train a 3D-UNet-based diffusion model (DM) for temporal latent flow generation. Unlike previous DMs operating in pixel space or latent feature space that couples spatial and temporal information, the DM in our LFDM only needs to learn a low-dimensional latent flow space for motion generation, thus being more computationally efficient. We conduct comprehensive experiments on multiple datasets, where LFDM consistently outperforms prior arts. Furthermore, we show that LFDM can be easily adapted to new domains by simply finetuning the image decoder. Our code is available at https://github.com/nihaomiao/CVPR23_LFDM.Comment: CVPR 202

arXiv.org e-Print Archive

Electrical conduction and reaction analysis on mixed-conducting iron-doped barium zirconate

Author: Adijanto
Basu
Benjamin A. Wilhite
Bierschenk
Bouwmeester
Brisse
Clarke
Esposito
Fagg
Haomiao Zhang
Holladay
Hwang
Ishihara
Jensen
Jensen
Kim-Lohsoontorn
Kreuer
Kröger
Laguna-Bercero
Leo
Li
Li
Löfberg
Minh
Mizusaki
Mogensen
Moos
Ni
Norby
Ortiz-Landeros
Qi
Qi
Ruckenstein
Rudham
Rui
Ruiz-Morales
Sahoo
Schiller
Skinner
Song
Song
Sunarso
Suresh
Tanaka
Tsai
Tsakoumis
Tuller
Yang
Zhan
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref